Flurex animal, PFOS and SCFA data
1 INFO
This document contains the commands necessary to analyse experimental data obtained from Flurex (internal project name: R20-22). The project data contains: - Animal weight data including calculated weights per bw and normalized weight data in decimals: + body weight (bw) from day 0 to day 8, including bw gain from day 0 - 8 + liver and cecum weight from dissection on day 8
- PFOS quantitative data:
- total dosed PFOS per rat on day 4 and 8 respectively (mg)
- blood day 4 and 8 in ug PFOS/mL serum including calculations on
- total blood volume per animal based on an standard average of 64mL blood / kilogram in rats “Diehl et al. 2001”
- total PFOS in blood volume (mg)
- total PFOS detected in blood from total dosed per day (pct)
- liver from dissection on day 8 in ug PFOS/g tissue including
calculations on
- total pfos in liver per rat (mg)
- total PFOS detected in liver from total dosed on day 8 (pct)
- Short-chain fatty acids quantification of 10 compounds in colonic
water given in mM from day 8:
- acetic acid, formic acid, propanoic acid, 2-methyl-propanoic acid, butanoic acid, 3-methyl-butanoic acid, pentanoic acid, 4-methyl-pentanoic acid, hexanoic acid and heptanoic acid
2 Setup
Following code loads packages, creates necessary folder and saves parameters for the following analyses.
knitr::opts_chunk$set(echo = TRUE)
# Load libraries
library(tidyverse)
library(phyloseq)
library(decontam)
library(pals)
library(ggpubr)
library(vegan)
library(phangorn)
library(kableExtra)
library(plotly)
library(rstatix)
library(forcats)
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggbreak)
# Create used folders if missing
if (!file.exists("R_objects")) dir.create(file.path(getwd(), "R_objects"))
if (!file.exists("plots")) dir.create(file.path(getwd(), "plots"))
if (!file.exists("plots/animal_data")) dir.create(file.path(getwd(), "plots/animal_data"))
if (!file.exists("scripts")) dir.create(file.path(getwd(), "scripts"))
# Save params
saveRDS(params, file = "R_objects/animal_params.RDS")3 LOAD DATA
Loading data from CSV-format and saves as Rdata-format.
params <- readRDS("R_objects/animal_params.RDS")## Error in eval(expr, envir, enclos): cannot change value of locked binding for 'params'
# Load analysis data
dat <- read.csv(params$input, header = TRUE, sep = ";", dec = ",")
save(dat, file = "R_objects/animal_data.Rdata")
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.4 ANIMAL WEIGHT DATA
Animal weight data contains data from body weight through the entire study period with calculated body weight gain, and organ weights from cecum and liver.
4.1 Body weight gain
This section will prepare to perform the data analysis for body weight gain
4.1.1 Statistics
4.1.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- dat
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "bw_gain"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL bw_gain 12 17.1 2.26
## 2 PFOS bw_gain 12 17.2 4.04
## 3 VAN bw_gain 12 17.6 2.69
## 4 VAN+PFOS bw_gain 12 17.5 2.65
4.1.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R01 1 no no 310 321. 325. 339. 350 354
## 2 PFOS R30 18 yes no 258. 262. 265. 271. 270. 274
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.951 0.0457
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.530 0.664
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that body weight gain data has two outliers, has equal variance and is normally distributed without the outliers according to Shapiro-Wilk test. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.
4.1.1.3 ANOVA One-Way test
4.1.1.3.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(FORMULA)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 treatment 3 44 0.064 0.979 0.004
4.1.1.3.2 Perform posthoc test
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 6 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 treat… CTRL PFOS 0 0.111 -3.15 3.37 1 ns
## 2 treat… CTRL VAN 0 0.463 -2.79 3.72 0.981 ns
## 3 treat… CTRL VAN+P… 0 0.374 -2.88 3.63 0.99 ns
## 4 treat… PFOS VAN 0 0.353 -2.90 3.61 0.991 ns
## 5 treat… PFOS VAN+P… 0 0.264 -2.99 3.52 0.996 ns
## 6 treat… VAN VAN+P… 0 -0.0893 -3.35 3.17 1 ns
4.1.2 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "Bodyweight gain",limits = c(5,25),breaks = seq(5,25,5), labels = function(x) paste0(x, "%")) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
# Output plot
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.Body weight gain from day 0 to day 8
4.2 Cecum weight (normalized)
This section will prepare to perform the data analysis for normalized cecum weight data
4.2.1 Statistics
4.2.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- subset(dat, !is.na(cecum_norm))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "cecum_norm"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL cecum_norm 11 1 0.148
## 2 PFOS cecum_norm 12 0.944 0.164
## 3 VAN cecum_norm 12 1.88 0.24
## 4 VAN+PFOS cecum_norm 11 2.07 0.201
4.2.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN R15 27 no yes 268. 277. 283. 290. 296. 300
## 2 VAN R24 36 no yes 281. 286. 294. 305. 309. 312
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.984 0.753
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 42 0.416 0.742
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that normalised cecum weight data has two outliers, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.
4.2.1.3 ANOVA One-Way test
4.2.1.3.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(FORMULA)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 treatment 3 42 106.226 1.21e-19 * 0.884
4.2.1.3.2 Perform posthoc test
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 6 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 treatment CTRL PFOS 0 -0.0562 -0.271 0.158 8.96e- 1
## 2 treatment CTRL VAN 0 0.884 0.669 1.10 1.42e-12
## 3 treatment CTRL VAN+PFOS 0 1.07 0.850 1.29 1.06e-12
## 4 treatment PFOS VAN 0 0.940 0.730 1.15 1.09e-12
## 5 treatment PFOS VAN+PFOS 0 1.13 0.911 1.34 1.06e-12
## 6 treatment VAN VAN+PFOS 0 0.186 -0.0287 0.400 1.1 e- 1
## # ℹ 1 more variable: p.adj.signif <chr>
4.2.2 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% difference",limits = c(0.5,3.1),breaks = seq(0.5,3.1,0.5), labels = function(x) paste0(x*100, "%")) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(2.2,2.8,2.5,3.1))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.Normalized cecum weights at day 8
4.3 Liver weight (normalized)
This section will prepare to perform the data analysis for normalized liver weight data
4.3.1 Statistics
4.3.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(liver_norm))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "liver_norm"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL liver_norm 12 1 0.055
## 2 PFOS liver_norm 12 1.08 0.055
## 3 VAN liver_norm 12 0.934 0.044
## 4 VAN+PFOS liver_norm 12 1.03 0.065
4.3.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## [1] treatment rat_name ordering pfos
## [5] van bw_0 bw_1 bw_2
## [9] bw_3 bw_4 bw_5 bw_6
## [13] bw_7 bw_8 bw_gain cecum_wt
## [17] cecum_wt_bw cecum_norm liver_wt liver_wt_bw
## [21] liver_norm tot_pfos4 blood_vol4_mL pfos_serum4_ugml
## [25] pfos_serum4_ug pfos_serum4_mg pfos_serum4_pct tot_pfos8
## [29] blood_vol8_mL pfos_serum8_ugml pfos_serum8_ug pfos_serum8_mg
## [33] pfos_serum8_pct pfos_change48_pct pfos_liver_ugg pfos_liver_mg
## [37] pfos_liver_pct acetic formic propanoic
## [41] m2_propanoic butanoic m3_butanoic pentanoic
## [45] m4_pentanoic hexanoic heptanoic is.outlier
## [49] is.extreme
## <0 rækker> (eller 0-længde row.names)
Data contains zero outliers.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.985 0.778
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.430 0.733
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that normalised liver weight data has two outliers, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.
4.3.1.3 ANOVA One-Way test
4.3.1.3.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(FORMULA)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 treatment 3 44 15.909 3.79e-07 * 0.52
4.3.1.3.2 Perform posthoc test
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 6 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 treatment CTRL PFOS 0 0.0846 0.0246 0.145 0.0027
## 2 treatment CTRL VAN 0 -0.0663 -0.126 -0.00623 0.0254
## 3 treatment CTRL VAN+PFOS 0 0.0352 -0.0249 0.0953 0.409
## 4 treatment PFOS VAN 0 -0.151 -0.211 -0.0909 0.000000182
## 5 treatment PFOS VAN+PFOS 0 -0.0494 -0.110 0.0107 0.14
## 6 treatment VAN VAN+PFOS 0 0.102 0.0415 0.162 0.000269
## # ℹ 1 more variable: p.adj.signif <chr>
4.3.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% difference",limits = c(0.75,1.5),breaks = seq(0.75,1.5,0.25), labels = function(x) paste0(x*100, "%")) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(1.35,1.4,1.45,1.5))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.Normalized liver weights at day 8
5 PFOS QUANTITATIVE DATA
Following section handles data analysis of PFOS from serum and liver samples (Run on Dionex Ultimate 3000 / Bruker EVOQ Elite UPLC-MS/MS against linear PPOS standard curve and with internal MPFOS standard).
5.1 Blood serum day 4
This section will prepare to perform the data analysis for PFOS data from serum on day 4.
5.1.1 ug/mL in serum
5.1.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_serum4_ugml))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_serum4_ugml"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL pfos_serum4_ugml 11 0 0.001
## 2 PFOS pfos_serum4_ugml 12 9.17 2.01
## 3 VAN pfos_serum4_ugml 11 0.001 0.001
## 4 VAN+PFOS pfos_serum4_ugml 10 10.0 1.18
5.1.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 5 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R06 6 no no 274. 276. 285. 296. 297. 302
## 2 CTRL R11 11 no no 271. 278. 287. 294. 293. 299
## 3 PFOS R29 17 yes no 239. 243. 248. 255. 259. 267
## 4 VAN R13 25 no yes 218. 222. 228. 234 231. 241
## 5 VAN R21 33 no yes 262. 268. 274. 285. 281. 291
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.807 0.00000412
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 40 7.48 0.000436
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.1.1.3 Kruskal-Wallis test
5.1.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 pfos_serum4_ugml 44 35.4 3 0.000000101 Kruskal-Wallis
5.1.1.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 pfos_serum4_ugml 44 0.809 eta2[H] large
5.1.1.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 pfos_serum4_… CTRL PFOS 11 12 3.86 1.15e-4 1.85e-4 ***
## 2 pfos_serum4_… CTRL VAN 11 11 0.0172 9.86e-1 9.86e-1 ns
## 3 pfos_serum4_… CTRL VAN+P… 11 10 4.53 5.87e-6 1.91e-5 ****
## 4 pfos_serum4_… PFOS VAN 12 11 -3.84 1.23e-4 1.85e-4 ***
## 5 pfos_serum4_… PFOS VAN+P… 12 10 0.863 3.88e-1 4.66e-1 ns
## 6 pfos_serum4_… VAN VAN+P… 11 10 4.51 6.36e-6 1.91e-5 ****
5.1.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "ug/mL",limits = c(0,20),breaks = seq(0,20,5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(14,17,15,14))
p## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 16 rows containing missing values (`geom_point()`).
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.PFOS conc. in ug/mL at day 4
5.1.2 Total mg in serum
5.1.2.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum4_mg"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum4_mg))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.1.2.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# identify outliers
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
identify_outliers(!!sym(OUTCOME))## [1] treatment rat_name ordering pfos
## [5] van bw_0 bw_1 bw_2
## [9] bw_3 bw_4 bw_5 bw_6
## [13] bw_7 bw_8 bw_gain cecum_wt
## [17] cecum_wt_bw cecum_norm liver_wt liver_wt_bw
## [21] liver_norm tot_pfos4 blood_vol4_mL pfos_serum4_ugml
## [25] pfos_serum4_ug pfos_serum4_mg pfos_serum4_pct tot_pfos8
## [29] blood_vol8_mL pfos_serum8_ugml pfos_serum8_ug pfos_serum8_mg
## [33] pfos_serum8_pct pfos_change48_pct pfos_liver_ugg pfos_liver_mg
## [37] pfos_liver_pct acetic formic propanoic
## [41] m2_propanoic butanoic m3_butanoic pentanoic
## [45] m4_pentanoic hexanoic heptanoic is.outlier
## [49] is.extreme
## <0 rækker> (eller 0-længde row.names)
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
# Run Shapiro test
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
shapiro_test(!!sym(OUTCOME))## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_serum4_mg 0.955 0.717
## 2 VAN+PFOS pfos_serum4_mg 0.930 0.446
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
# Run test
dat.clean %>% levene_test(FORMULA)## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 20 0.0274 0.870
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05If the p-value of the Levene’s test is significant, it suggests that
there is a significant difference between the variances of the two
groups. In such case we should use Welch t-test, which doesn’t assume
the equality of the two variances (var.equal=FALSE). If the
Levene’s test is non-significant we can perform a Student t-test
(var.equal=TRUE).
No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.
5.1.2.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -0.0171 0.165 0.183 pfos_s… PFOS VAN+P… 12 10 -1.23 0.232
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
The output provides:
.y.: the y variable used in the test.group1,group2: the compared groups in the pairwise tests.statistic: Test statistic used to compute the p-value.df: degrees of freedom.p: p-value.p.adj: the adjusted p-value.method: the statistical test used to compare groups.p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.alternative: a character string describing the alternative hypothesis.conf.low,conf.high: Lower and upper bound on a confidence interval.
Effect size
The effect size is calculated as Cohen’s D
dat.clean %>% cohens_d(FORMULA,
var.equal = EQUAL.VAR,
paired = FALSE)## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_serum4_mg PFOS VAN+PFOS -0.528 12 10 moderate
5.1.2.4 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mg PFOS",limits = c(0,0.30),breaks = seq(0,0.30,0.1)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(0.28))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.Total mg PFOS in blood volume
5.1.3 Pct.
Data for PFOS levels in serum at day 4 calculated from the total PFOS dosed at the time point. #### Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum4_pct"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum4_pct))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.1.3.1 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# identify outliers
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 1 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS R29 17 yes no 239. 243. 248. 255. 259. 267
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
# Run Shapiro test
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
shapiro_test(!!sym(OUTCOME))## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_serum4_pct 0.937 0.456
## 2 VAN+PFOS pfos_serum4_pct 0.939 0.547
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
# Run test
dat.clean %>% levene_test(FORMULA)## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 20 1.43 0.246
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05If the p-value of the Levene’s test is significant, it suggests that
there is a significant difference between the variances of the two
groups. In such case we should use Welch t-test, which doesn’t assume
the equality of the two variances (var.equal=FALSE). If the
Levene’s test is non-significant we can perform a Student t-test
(var.equal=TRUE).
5.1.3.2 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -0.626 6.75 7.37 pfos_s… PFOS VAN+P… 12 10 -1.17 0.254
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
The output provides:
.y.: the y variable used in the test.group1,group2: the compared groups in the pairwise tests.statistic: Test statistic used to compute the p-value.df: degrees of freedom.p: p-value.p.adj: the adjusted p-value.method: the statistical test used to compare groups.p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.alternative: a character string describing the alternative hypothesis.conf.low,conf.high: Lower and upper bound on a confidence interval.
Effect size
The effect size is calculated as Cohen’s D
dat.clean %>% cohens_d(FORMULA,
var.equal = EQUAL.VAR,
paired = FALSE)## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_serum4_pct PFOS VAN+PFOS -0.503 12 10 moderate
5.1.3.3 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% of total dosed PFOS", limits = c(3,10),breaks = seq(3,10,1)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE) #, y.position = c(1.35,1.4,1.45,1.5))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.PFOS serum day 4 in pct. of total
5.2 Blood serum day 8
This section will prepare to perform the data analysis for PFOS data from serum on day 8.
5.2.1 ug/µL in serum
5.2.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_serum8_ugml))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_serum8_ugml"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL pfos_serum8_ugml 12 0.016 0.03
## 2 PFOS pfos_serum8_ugml 12 36.3 15.5
## 3 VAN pfos_serum8_ugml 12 0.011 0.021
## 4 VAN+PFOS pfos_serum8_ugml 12 32.2 10.7
5.2.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 7 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R05 5 no no 214 219. 222. 229. 231. 237
## 2 CTRL R09 9 no no 273. 278. 290. 290. 296. 302
## 3 CTRL R12 12 no no 297. 304. 316. 322. 324. 334
## 4 VAN R14 26 no yes 246. 256. 260. 267. 270. 274
## 5 VAN R16 28 no yes 256. 260 268. 275. 273 279
## 6 VAN R24 36 no yes 281. 286. 294. 305. 309. 312
## 7 VAN+PFOS R47 47 yes yes 242. 249. 255. 263. 267. 271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.823 0.00000461
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 8.92 0.0000993
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.2.1.3 Kruskal-Wallis test
5.2.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 pfos_serum8_ugml 48 37.3 3 0.0000000402 Kruskal-Wallis
5.2.1.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 pfos_serum8_ugml 48 0.779 eta2[H] large
5.2.1.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 pfos_serum8_… CTRL PFOS 12 12 4.37 1.26e-5 3.78e-5 ****
## 2 pfos_serum8_… CTRL VAN 12 12 -0.0899 9.28e-1 9.28e-1 ns
## 3 pfos_serum8_… CTRL VAN+P… 12 12 4.17 3.02e-5 4.53e-5 ****
## 4 pfos_serum8_… PFOS VAN 12 12 -4.46 8.32e-6 3.78e-5 ****
## 5 pfos_serum8_… PFOS VAN+P… 12 12 -0.195 8.46e-1 9.28e-1 ns
## 6 pfos_serum8_… VAN VAN+P… 12 12 4.26 2.03e-5 4.05e-5 ****
5.2.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "ug/mL",limits = c(0,80),breaks = seq(0,80,10)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(72,80,75,72))
p## Warning: Removed 11 rows containing missing values (`geom_point()`).
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 11 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 11 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.PFOS conc. in ug/mL at day 8
5.2.2 Total mg in serum
5.2.2.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum8_mg"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum8_mg))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.2.2.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# identify outliers
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
identify_outliers(!!sym(OUTCOME))## [1] treatment rat_name ordering pfos
## [5] van bw_0 bw_1 bw_2
## [9] bw_3 bw_4 bw_5 bw_6
## [13] bw_7 bw_8 bw_gain cecum_wt
## [17] cecum_wt_bw cecum_norm liver_wt liver_wt_bw
## [21] liver_norm tot_pfos4 blood_vol4_mL pfos_serum4_ugml
## [25] pfos_serum4_ug pfos_serum4_mg pfos_serum4_pct tot_pfos8
## [29] blood_vol8_mL pfos_serum8_ugml pfos_serum8_ug pfos_serum8_mg
## [33] pfos_serum8_pct pfos_change48_pct pfos_liver_ugg pfos_liver_mg
## [37] pfos_liver_pct acetic formic propanoic
## [41] m2_propanoic butanoic m3_butanoic pentanoic
## [45] m4_pentanoic hexanoic heptanoic is.outlier
## [49] is.extreme
## <0 rækker> (eller 0-længde row.names)
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
# Run Shapiro test
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
shapiro_test(!!sym(OUTCOME))## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_serum8_mg 0.890 0.119
## 2 VAN+PFOS pfos_serum8_mg 0.902 0.168
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
# Run test
dat.clean %>% levene_test(FORMULA)## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 1.98 0.173
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05If the p-value of the Levene’s test is significant, it suggests that
there is a significant difference between the variances of the two
groups. In such case we should use Welch t-test, which doesn’t assume
the equality of the two variances (var.equal=FALSE). If the
Levene’s test is non-significant we can perform a Student t-test
(var.equal=TRUE).
No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.
5.2.2.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 0.0931 0.717 0.624 pfos_s… PFOS VAN+P… 12 12 0.853 0.403
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Effect size
The effect size is calculated as Cohen’s D
dat.clean %>% cohens_d(FORMULA,
var.equal = EQUAL.VAR,
paired = FALSE)## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_serum8_mg PFOS VAN+PFOS 0.348 12 12 small
5.2.2.4 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mg PFOS",limits = c(0,2),breaks = seq(0,2,0.5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(1.75))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.Total mg PFOS in blood volume
5.2.3 Pct.
5.2.3.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum8_pct"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes" & !rat_name == "R47")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum8_pct))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.2.3.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# identify outliers
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
identify_outliers(!!sym(OUTCOME))## [1] treatment rat_name ordering pfos
## [5] van bw_0 bw_1 bw_2
## [9] bw_3 bw_4 bw_5 bw_6
## [13] bw_7 bw_8 bw_gain cecum_wt
## [17] cecum_wt_bw cecum_norm liver_wt liver_wt_bw
## [21] liver_norm tot_pfos4 blood_vol4_mL pfos_serum4_ugml
## [25] pfos_serum4_ug pfos_serum4_mg pfos_serum4_pct tot_pfos8
## [29] blood_vol8_mL pfos_serum8_ugml pfos_serum8_ug pfos_serum8_mg
## [33] pfos_serum8_pct pfos_change48_pct pfos_liver_ugg pfos_liver_mg
## [37] pfos_liver_pct acetic formic propanoic
## [41] m2_propanoic butanoic m3_butanoic pentanoic
## [45] m4_pentanoic hexanoic heptanoic is.outlier
## [49] is.extreme
## <0 rækker> (eller 0-længde row.names)
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
# Run Shapiro test
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
shapiro_test(!!sym(OUTCOME))## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_serum8_pct 0.867 0.0594
## 2 VAN+PFOS pfos_serum8_pct 0.899 0.181
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
# Run test
dat.clean %>% levene_test(FORMULA)## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 21 2.86 0.106
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05If the p-value of the Levene’s test is significant, it suggests that
there is a significant difference between the variances of the two
groups. In such case we should use Welch t-test, which doesn’t assume
the equality of the two variances (var.equal=FALSE). If the
Levene’s test is non-significant we can perform a Student t-test
(var.equal=TRUE).
No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.
5.2.3.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 2.05 11.8 9.74 pfos_s… PFOS VAN+P… 12 11 1.29 0.212
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Effect size
The effect size is calculated as Cohen’s D
dat.clean %>% cohens_d(FORMULA,
var.equal = EQUAL.VAR,
paired = FALSE)## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_serum8_pct PFOS VAN+PFOS 0.537 12 11 moderate
5.2.3.4 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% of total dosed PFOS", limits = c(5,25),breaks = seq(5,25,5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(24))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.PFOS serum day 8 in pct. of total
5.3 Blood serum day 4 and 8
This section will prepare to perform the data analysis for PFOS data from serum on day 4 and 8 collected.
5.3.1 Change from day 4 to 8 (Pct.)
5.3.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_change48_pct"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes") # add following to subset() to remove the outliers: & !rat_name %in% c("R47","R27"))
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_change48_pct))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.3.1.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# identify outliers
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS R27 15 yes no 270. 280. 284. 291. 290. 296
## 2 VAN+PFOS R47 47 yes yes 242. 249. 255. 263. 267. 271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Any extreme outliers can be bad samples or errors in data entry. If outliers, compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
# Run Shapiro test
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
shapiro_test(!!sym(OUTCOME))## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_change48_pct 0.894 0.132
## 2 VAN+PFOS pfos_change48_pct 0.805 0.0167
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
# Run test
dat.clean %>% levene_test(FORMULA)## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 20 0.639 0.434
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05Two outliers were identified (sample for R27 and R47). Analysis result and test method is similar with and without outliers. Data is normally distributed and has equal variance. Hence we use t-test.
5.3.1.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 85.1 341. 256. pfos_c… PFOS VAN+P… 12 10 1.11 0.281
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Effect size
The effect size is calculated as Cohen’s D
dat.clean %>% cohens_d(FORMULA,
var.equal = EQUAL.VAR,
paired = FALSE)## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_change48_pct PFOS VAN+PFOS 0.475 12 10 small
5.3.1.4 Conclusion
5.3.1.5 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create point plot with mean and SD
data_summary <- function(x) {
m <- mean(x)
ymin <- m-sd(x)
ymax <- m+sd(x)
return(c(y=m,ymin=ymin,ymax=ymax))
}
data_summary_collapsed <- function(x) {
m <- mean(x)
ymin <- m
ymax <- m
return(c(y=m,ymin=ymin,ymax=ymax))
}
p <- ggplot(dat.clean, aes(x = .data[[PREDICTOR]], y = .data[[OUTCOME]], color = .data[[PREDICTOR]])) +
stat_summary(fun.data = data_summary_collapsed, geom = "crossbar", color = "black", width = 0.5, linewidth = 0.3) +
stat_summary(fun.data = data_summary, geom = "errorbar", color = "black", width = 0.15, linewidth = 0.5) +
geom_point(position = position_jitterdodge(dodge.width = 0.6, jitter.width = 0.4), size = 2, colour = "black", shape = 21, stroke = 0.5, aes(fill = treatment)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% change", limits = c(100,900),breaks = seq(100,900,100), labels = function(x) paste0(x, "%")) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme_pubr()
p# Alternative: Create boxplot
# p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
# fill = PREDICTOR,
# add = "jitter",
# add.params = list(size = 1)) +
# scale_fill_manual(values = params$COL) +
# scale_y_continuous(name = "% change", limits = c(100,900),breaks = seq(100,900,100)) +
# labs(fill = "Treatment") +
# scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE) #, y.position = c(1.35,1.4,1.45,1.5))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2p# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 70, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.PFOS serum level change from day 4 to 8 in pct.
5.3.2 Data ug/mL
5.3.2.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Color scheme
COL <- c("#61d46b","#ffe900","#31b44b","#efc000")
# Subset data
dat.sub <- subset(dat, pfos == "yes")
# Create data frame for data representation
dat.clean <- dat.sub %>% select(rat_name, treatment, pfos_serum4_ugml, pfos_serum8_ugml) %>%
pivot_longer(., cols = c(pfos_serum4_ugml, pfos_serum8_ugml), names_to = "data_group", values_to = "conc")
# Create column for day of sampling
dat.clean <- transform(dat.clean, "day" = ifelse(dat.clean$data_group == "pfos_serum8_ugml","d8","d4"))
# Create ID column for easier handling
for (i in dat.sub$rat_name) {
dat.clean$ID <- paste(dat.clean$day,"_",dat.clean$treatment)
}
# Order dataframe for analysis
dat.clean <- dat.clean[order(dat.clean$day),]
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(conc))
dat.clean## rat_name treatment data_group conc day ID
## 1 R25 PFOS pfos_serum4_ugml 8.56 d4 d4 _ PFOS
## 3 R26 PFOS pfos_serum4_ugml 9.60 d4 d4 _ PFOS
## 5 R27 PFOS pfos_serum4_ugml 7.92 d4 d4 _ PFOS
## 7 R28 PFOS pfos_serum4_ugml 8.64 d4 d4 _ PFOS
## 9 R29 PFOS pfos_serum4_ugml 12.96 d4 d4 _ PFOS
## 11 R30 PFOS pfos_serum4_ugml 8.68 d4 d4 _ PFOS
## 13 R31 PFOS pfos_serum4_ugml 5.72 d4 d4 _ PFOS
## 15 R32 PFOS pfos_serum4_ugml 7.56 d4 d4 _ PFOS
## 17 R33 PFOS pfos_serum4_ugml 8.24 d4 d4 _ PFOS
## 19 R34 PFOS pfos_serum4_ugml 11.36 d4 d4 _ PFOS
## 21 R35 PFOS pfos_serum4_ugml 9.00 d4 d4 _ PFOS
## 23 R36 PFOS pfos_serum4_ugml 11.84 d4 d4 _ PFOS
## 25 R37 VAN+PFOS pfos_serum4_ugml 9.36 d4 d4 _ VAN+PFOS
## 27 R38 VAN+PFOS pfos_serum4_ugml 11.76 d4 d4 _ VAN+PFOS
## 29 R39 VAN+PFOS pfos_serum4_ugml 10.64 d4 d4 _ VAN+PFOS
## 31 R40 VAN+PFOS pfos_serum4_ugml 12.12 d4 d4 _ VAN+PFOS
## 33 R41 VAN+PFOS pfos_serum4_ugml 9.12 d4 d4 _ VAN+PFOS
## 35 R42 VAN+PFOS pfos_serum4_ugml 9.80 d4 d4 _ VAN+PFOS
## 37 R43 VAN+PFOS pfos_serum4_ugml 10.08 d4 d4 _ VAN+PFOS
## 43 R46 VAN+PFOS pfos_serum4_ugml 9.88 d4 d4 _ VAN+PFOS
## 45 R47 VAN+PFOS pfos_serum4_ugml 8.80 d4 d4 _ VAN+PFOS
## 47 R48 VAN+PFOS pfos_serum4_ugml 8.60 d4 d4 _ VAN+PFOS
## 2 R25 PFOS pfos_serum8_ugml 43.92 d8 d8 _ PFOS
## 4 R26 PFOS pfos_serum8_ugml 45.60 d8 d8 _ PFOS
## 6 R27 PFOS pfos_serum8_ugml 69.92 d8 d8 _ PFOS
## 8 R28 PFOS pfos_serum8_ugml 21.92 d8 d8 _ PFOS
## 10 R29 PFOS pfos_serum8_ugml 26.56 d8 d8 _ PFOS
## 12 R30 PFOS pfos_serum8_ugml 47.84 d8 d8 _ PFOS
## 14 R31 PFOS pfos_serum8_ugml 21.84 d8 d8 _ PFOS
## 16 R32 PFOS pfos_serum8_ugml 29.36 d8 d8 _ PFOS
## 18 R33 PFOS pfos_serum8_ugml 22.08 d8 d8 _ PFOS
## 20 R34 PFOS pfos_serum8_ugml 52.48 d8 d8 _ PFOS
## 22 R35 PFOS pfos_serum8_ugml 29.84 d8 d8 _ PFOS
## 24 R36 PFOS pfos_serum8_ugml 23.76 d8 d8 _ PFOS
## 26 R37 VAN+PFOS pfos_serum8_ugml 36.08 d8 d8 _ VAN+PFOS
## 28 R38 VAN+PFOS pfos_serum8_ugml 31.28 d8 d8 _ VAN+PFOS
## 30 R39 VAN+PFOS pfos_serum8_ugml 25.92 d8 d8 _ VAN+PFOS
## 32 R40 VAN+PFOS pfos_serum8_ugml 23.92 d8 d8 _ VAN+PFOS
## 34 R41 VAN+PFOS pfos_serum8_ugml 21.68 d8 d8 _ VAN+PFOS
## 36 R42 VAN+PFOS pfos_serum8_ugml 40.96 d8 d8 _ VAN+PFOS
## 38 R43 VAN+PFOS pfos_serum8_ugml 25.60 d8 d8 _ VAN+PFOS
## 40 R44 VAN+PFOS pfos_serum8_ugml 37.44 d8 d8 _ VAN+PFOS
## 42 R45 VAN+PFOS pfos_serum8_ugml 25.36 d8 d8 _ VAN+PFOS
## 44 R46 VAN+PFOS pfos_serum8_ugml 34.72 d8 d8 _ VAN+PFOS
## 46 R47 VAN+PFOS pfos_serum8_ugml 59.52 d8 d8 _ VAN+PFOS
## 48 R48 VAN+PFOS pfos_serum8_ugml 23.76 d8 d8 _ VAN+PFOS
# Set names of variables
PREDICTOR <- "ID"
OUTCOME <- "conc"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## ID variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 d4 _ PFOS conc 12 9.17 2.01
## 2 d4 _ VAN+PFOS conc 10 10.0 1.18
## 3 d8 _ PFOS conc 12 36.3 15.5
## 4 d8 _ VAN+PFOS conc 12 32.2 10.7
5.3.2.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = COL)
bxp5.3.2.3 Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 8
## ID rat_name treatment data_group conc day is.outlier is.extreme
## <chr> <chr> <chr> <chr> <dbl> <chr> <lgl> <lgl>
## 1 d4 _ PFOS R29 PFOS pfos_serum… 13.0 d4 TRUE FALSE
## 2 d8 _ VAN+PFOS R47 VAN+PFOS pfos_serum… 59.5 d8 TRUE FALSE
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.875 0.000151
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 42 6.46 0.00108
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that PFOS concentrations from day 4 and 8 has two outliers, has unequal variance, and falls short on the Shapiro-Wilk test of normality (not normally distributed). Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.3.2.4 Kruskal-Wallis test
5.3.2.4.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 conc 46 34.4 3 0.000000165 Kruskal-Wallis
5.3.2.4.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 conc 46 0.747 eta2[H] large
5.3.2.4.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 conc d4 _ PFOS d4 _ V… 12 10 0.798 4.25e-1 5.10e-1 ns
## 2 conc d4 _ PFOS d8 _ P… 12 12 4.68 2.92e-6 1.75e-5 ****
## 3 conc d4 _ PFOS d8 _ V… 12 12 4.48 7.51e-6 2.25e-5 ****
## 4 conc d4 _ VAN+PFOS d8 _ P… 10 12 3.66 2.51e-4 5.02e-4 ***
## 5 conc d4 _ VAN+PFOS d8 _ V… 10 12 3.47 5.15e-4 7.73e-4 ***
## 6 conc d8 _ PFOS d8 _ V… 12 12 -0.198 8.43e-1 8.43e-1 ns
5.3.2.5 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = COL,labels = c("PFOS day 4","VAN+PFOS day 4","PFOS day 8","VAN+PFOS day 8")) +
scale_y_continuous(name = "ug/mL",limits = c(0,85),breaks = seq(0,85,10)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment", labels = c("PFOS\nday 4","VAN+PFOS\nday 4","PFOS\nday 8","VAN+PFOS\nday 8"))
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(75,85,70,80))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.PFOS conc. in ug/mL at day 8
5.3.3 Data mg
5.3.3.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Color scheme
COL <- c("#61d46b","#ffe900","#31b44b","#efc000")
# Subset data
dat.sub <- subset(dat, pfos == "yes")
# Create data frame for data representation
dat.clean <- dat.sub %>% select(rat_name, treatment, pfos_serum4_mg, pfos_serum8_mg) %>%
pivot_longer(., cols = c(pfos_serum4_mg, pfos_serum8_mg), names_to = "data_group", values_to = "mg")
# Create column for day of sampling
dat.clean <- transform(dat.clean, "day" = ifelse(dat.clean$data_group == "pfos_serum8_mg","d8","d4"))
# Create ID column for easier handling
for (i in dat.sub$rat_name) {
dat.clean$ID <- paste(dat.clean$day,"_",dat.clean$treatment)
}
# Order dataframe for analysis
dat.clean <- dat.clean[order(dat.clean$day),]
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(mg))
dat.clean## rat_name treatment data_group mg day ID
## 1 R25 PFOS pfos_serum4_mg 0.1904292 d4 d4 _ PFOS
## 3 R26 PFOS pfos_serum4_mg 0.1592525 d4 d4 _ PFOS
## 5 R27 PFOS pfos_serum4_mg 0.1472486 d4 d4 _ PFOS
## 7 R28 PFOS pfos_serum4_mg 0.1480827 d4 d4 _ PFOS
## 9 R29 PFOS pfos_serum4_mg 0.2149079 d4 d4 _ PFOS
## 11 R30 PFOS pfos_serum4_mg 0.1498237 d4 d4 _ PFOS
## 13 R31 PFOS pfos_serum4_mg 0.1123500 d4 d4 _ PFOS
## 15 R32 PFOS pfos_serum4_mg 0.1359590 d4 d4 _ PFOS
## 17 R33 PFOS pfos_serum4_mg 0.1546747 d4 d4 _ PFOS
## 19 R34 PFOS pfos_serum4_mg 0.2121503 d4 d4 _ PFOS
## 21 R35 PFOS pfos_serum4_mg 0.1792512 d4 d4 _ PFOS
## 23 R36 PFOS pfos_serum4_mg 0.1811804 d4 d4 _ PFOS
## 25 R37 VAN+PFOS pfos_serum4_mg 0.1927112 d4 d4 _ VAN+PFOS
## 27 R38 VAN+PFOS pfos_serum4_mg 0.2344474 d4 d4 _ VAN+PFOS
## 29 R39 VAN+PFOS pfos_serum4_mg 0.1721467 d4 d4 _ VAN+PFOS
## 31 R40 VAN+PFOS pfos_serum4_mg 0.2381338 d4 d4 _ VAN+PFOS
## 33 R41 VAN+PFOS pfos_serum4_mg 0.1657651 d4 d4 _ VAN+PFOS
## 35 R42 VAN+PFOS pfos_serum4_mg 0.1652672 d4 d4 _ VAN+PFOS
## 37 R43 VAN+PFOS pfos_serum4_mg 0.2037289 d4 d4 _ VAN+PFOS
## 43 R46 VAN+PFOS pfos_serum4_mg 0.1702838 d4 d4 _ VAN+PFOS
## 45 R47 VAN+PFOS pfos_serum4_mg 0.1501491 d4 d4 _ VAN+PFOS
## 47 R48 VAN+PFOS pfos_serum4_mg 0.1332518 d4 d4 _ VAN+PFOS
## 2 R25 PFOS pfos_serum8_mg 1.0930000 d8 d8 _ PFOS
## 4 R26 PFOS pfos_serum8_mg 0.8230000 d8 d8 _ PFOS
## 6 R27 PFOS pfos_serum8_mg 1.3690000 d8 d8 _ PFOS
## 8 R28 PFOS pfos_serum8_mg 0.3980000 d8 d8 _ PFOS
## 10 R29 PFOS pfos_serum8_mg 0.4810000 d8 d8 _ PFOS
## 12 R30 PFOS pfos_serum8_mg 0.8450000 d8 d8 _ PFOS
## 14 R31 PFOS pfos_serum8_mg 0.4720000 d8 d8 _ PFOS
## 16 R32 PFOS pfos_serum8_mg 0.5750000 d8 d8 _ PFOS
## 18 R33 PFOS pfos_serum8_mg 0.4550000 d8 d8 _ PFOS
## 20 R34 PFOS pfos_serum8_mg 1.0550000 d8 d8 _ PFOS
## 22 R35 PFOS pfos_serum8_mg 0.6510000 d8 d8 _ PFOS
## 24 R36 PFOS pfos_serum8_mg 0.3890000 d8 d8 _ PFOS
## 26 R37 VAN+PFOS pfos_serum8_mg 0.7850000 d8 d8 _ VAN+PFOS
## 28 R38 VAN+PFOS pfos_serum8_mg 0.6670000 d8 d8 _ VAN+PFOS
## 30 R39 VAN+PFOS pfos_serum8_mg 0.4530000 d8 d8 _ VAN+PFOS
## 32 R40 VAN+PFOS pfos_serum8_mg 0.5010000 d8 d8 _ VAN+PFOS
## 34 R41 VAN+PFOS pfos_serum8_mg 0.4270000 d8 d8 _ VAN+PFOS
## 36 R42 VAN+PFOS pfos_serum8_mg 0.7600000 d8 d8 _ VAN+PFOS
## 38 R43 VAN+PFOS pfos_serum8_mg 0.5540000 d8 d8 _ VAN+PFOS
## 40 R44 VAN+PFOS pfos_serum8_mg 0.7520000 d8 d8 _ VAN+PFOS
## 42 R45 VAN+PFOS pfos_serum8_mg 0.4590000 d8 d8 _ VAN+PFOS
## 44 R46 VAN+PFOS pfos_serum8_mg 0.6440000 d8 d8 _ VAN+PFOS
## 46 R47 VAN+PFOS pfos_serum8_mg 1.0890000 d8 d8 _ VAN+PFOS
## 48 R48 VAN+PFOS pfos_serum8_mg 0.3980000 d8 d8 _ VAN+PFOS
# Set names of variables
PREDICTOR <- "ID"
OUTCOME <- "mg"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## ID variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 d4 _ PFOS mg 12 0.165 0.031
## 2 d4 _ VAN+PFOS mg 10 0.183 0.034
## 3 d8 _ PFOS mg 12 0.717 0.32
## 4 d8 _ VAN+PFOS mg 12 0.624 0.201
5.3.3.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = COL)
bxp5.3.3.3 Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## [1] ID rat_name treatment data_group mg day is.outlier
## [8] is.extreme
## <0 rækker> (eller 0-længde row.names)
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.896 0.000640
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 42 9.67 0.0000564
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that PFOS concentrations from day 4 and 8 has two outliers, has unequal variance, and falls short on the Shapiro-Wilk test of normality (not normally distributed). Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.3.3.4 Kruskal-Wallis test
5.3.3.4.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 mg 46 34.1 3 0.000000187 Kruskal-Wallis
5.3.3.4.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 mg 46 0.741 eta2[H] large
5.3.3.4.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 mg d4 _ PFOS d4 _ V… 12 10 0.574 5.66e-1 6.79e-1 ns
## 2 mg d4 _ PFOS d8 _ P… 12 12 4.62 3.92e-6 2.35e-5 ****
## 3 mg d4 _ PFOS d8 _ V… 12 12 4.33 1.51e-5 4.54e-5 ****
## 4 mg d4 _ VAN+PFOS d8 _ P… 10 12 3.83 1.30e-4 2.60e-4 ***
## 5 mg d4 _ VAN+PFOS d8 _ V… 10 12 3.55 3.84e-4 5.75e-4 ***
## 6 mg d8 _ PFOS d8 _ V… 12 12 -0.289 7.73e-1 7.73e-1 ns
5.3.3.5 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = COL,labels = c("PFOS day 4","VAN+PFOS day 4","PFOS day 8","VAN+PFOS day 8")) +
scale_y_continuous(name = "mg",limits = c(0,1.75),breaks = seq(0,1.75,0.5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment", labels = c("PFOS\nday 4","VAN+PFOS\nday 4","PFOS\nday 8","VAN+PFOS\nday 8")) +
theme(axis.title.x = element_blank())
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(1.635,1.75,1.4,1.52))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.PFOS amount in mg at day 4 and 8
5.4 Liver day 8
This section will prepare to perform the data analysis for PFOS data from liver on day 8.
5.4.1 ug/g in liver tissue
5.4.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_liver_ugg))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_liver_ugg"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL pfos_liver_ugg 12 0.199 0.173
## 2 PFOS pfos_liver_ugg 12 176. 21.7
## 3 VAN pfos_liver_ugg 12 0.205 0.22
## 4 VAN+PFOS pfos_liver_ugg 12 196. 18.5
5.4.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R11 11 no no 271. 278. 287. 294. 293. 299
## 2 VAN R16 28 no yes 256. 260 268. 275. 273 279
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.877 0.000119
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 12.2 0.00000631
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.4.1.3 Kruskal-Wallis test
5.4.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 pfos_liver_ugg 48 36.4 3 0.0000000608 Kruskal-Wallis
5.4.1.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 pfos_liver_ugg 48 0.760 eta2[H] large
5.4.1.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 pfos_liver_u… CTRL PFOS 12 12 3.62 2.98e-4 4.47e-4 ***
## 2 pfos_liver_u… CTRL VAN 12 12 -0.102 9.19e-1 9.19e-1 ns
## 3 pfos_liver_u… CTRL VAN+P… 12 12 4.68 2.85e-6 8.54e-6 ****
## 4 pfos_liver_u… PFOS VAN 12 12 -3.72 2.00e-4 4.00e-4 ***
## 5 pfos_liver_u… PFOS VAN+P… 12 12 1.06 2.87e-1 3.44e-1 ns
## 6 pfos_liver_u… VAN VAN+P… 12 12 4.78 1.72e-6 8.54e-6 ****
5.4.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "ug/g",limits = c(0,300),breaks = seq(0,300,50)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(240,280,260,240))
p## Warning: Removed 1 rows containing missing values (`geom_point()`).
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot_legend.pdf"), p, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage. ### mg in
liver tissue in all groups #### Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_liver_mg))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_liver_mg"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL pfos_liver_mg 12 0.002 0.002
## 2 PFOS pfos_liver_mg 12 2.11 0.305
## 3 VAN pfos_liver_mg 12 0.002 0.002
## 4 VAN+PFOS pfos_liver_mg 12 2.22 0.341
5.4.1.5 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 1 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN R16 28 no yes 256. 260 268. 275. 273 279
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains one not extreme outliers (R16).
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.861 0.0000433
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 16.5 0.000000257
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that mg PFOS in liver in all groups has one outlier, unequal variance, and falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.4.1.6 Kruskal-Wallis test
5.4.1.6.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 pfos_liver_mg 48 35.6 3 0.0000000927 Kruskal-Wallis
5.4.1.6.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 pfos_liver_mg 48 0.740 eta2[H] large
5.4.1.6.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 pfos_liver_mg CTRL PFOS 12 12 3.86 1.12e-4 1.67e-4 ***
## 2 pfos_liver_mg CTRL VAN 12 12 -0.146 8.84e-1 8.84e-1 ns
## 3 pfos_liver_mg CTRL VAN+P… 12 12 4.39 1.14e-5 3.42e-5 ****
## 4 pfos_liver_mg PFOS VAN 12 12 -4.01 6.08e-5 1.22e-4 ***
## 5 pfos_liver_mg PFOS VAN+P… 12 12 0.525 6.00e-1 7.20e-1 ns
## 6 pfos_liver_mg VAN VAN+P… 12 12 4.53 5.77e-6 3.42e-5 ****
5.4.1.7 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mg PFOS",limits = c(0,3.5),breaks = seq(0,3.5,0.01)) +
scale_y_break(breaks = c(0.01,1), scales = 3, ticklabels = c(1.0,2.0,3.0), space = 0.3) +
labs(fill = "Treatment") +
theme(axis.title.x = element_blank(),
axis.line.y.right = element_blank(),
axis.text.y.right = element_blank(),
axis.ticks.y.right = element_blank()) +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(3.2,3,3.4,3.2))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot_legend.pdf"), p, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.PFOS conc. in ug/g at day 8
5.4.2 Total mg in liver
5.4.2.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_liver_mg"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_liver_mg))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.4.2.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# identify outliers
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
identify_outliers(!!sym(OUTCOME))## [1] treatment rat_name ordering pfos
## [5] van bw_0 bw_1 bw_2
## [9] bw_3 bw_4 bw_5 bw_6
## [13] bw_7 bw_8 bw_gain cecum_wt
## [17] cecum_wt_bw cecum_norm liver_wt liver_wt_bw
## [21] liver_norm tot_pfos4 blood_vol4_mL pfos_serum4_ugml
## [25] pfos_serum4_ug pfos_serum4_mg pfos_serum4_pct tot_pfos8
## [29] blood_vol8_mL pfos_serum8_ugml pfos_serum8_ug pfos_serum8_mg
## [33] pfos_serum8_pct pfos_change48_pct pfos_liver_ugg pfos_liver_mg
## [37] pfos_liver_pct acetic formic propanoic
## [41] m2_propanoic butanoic m3_butanoic pentanoic
## [45] m4_pentanoic hexanoic heptanoic is.outlier
## [49] is.extreme
## <0 rækker> (eller 0-længde row.names)
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
# Run Shapiro test
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
shapiro_test(!!sym(OUTCOME))## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_liver_mg 0.944 0.546
## 2 VAN+PFOS pfos_liver_mg 0.955 0.711
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
# Run test
dat.clean %>% levene_test(FORMULA)## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 0.142 0.710
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.
5.4.2.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -0.115 2.11 2.22 pfos_l… PFOS VAN+P… 12 12 -0.870 0.394
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Effect size
The effect size is calculated as Cohen’s D
dat.clean %>% cohens_d(FORMULA,
var.equal = EQUAL.VAR,
paired = FALSE)## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_liver_mg PFOS VAN+PFOS -0.355 12 12 small
5.4.2.4 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mg PFOS",limits = c(0,3),breaks = seq(0,3,0.5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(3))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.Total mg PFOS in total liver
5.4.3 Pct.
5.4.3.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_liver_pct"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_liver_pct))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.4.3.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# identify outliers
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS R26 14 yes no 238. 246. 248 257 259. 265
## 2 PFOS R27 15 yes no 270. 280. 284. 291. 290. 296
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
# Run Shapiro test
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
shapiro_test(!!sym(OUTCOME))## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_liver_pct 0.985 0.997
## 2 VAN+PFOS pfos_liver_pct 0.937 0.456
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
# Run test
dat.clean %>% levene_test(FORMULA)## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 0.00119 0.973
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05If the p-value of the Levene’s test is significant, it suggests that
there is a significant difference between the variances of the two
groups. In such case we should use Welch t-test, which doesn’t assume
the equality of the two variances (var.equal=FALSE). If the
Levene’s test is non-significant we can perform a Student t-test
(var.equal=TRUE).
Two outliers were identified but analysis does not differ in result when excluded. Data is normally distributed and has equal variance. Hence we use t-test.
5.4.3.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -2.30 35.0 37.3 pfos_l… PFOS VAN+P… 12 12 -1.46 0.159
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
The output provides:
.y.: the y variable used in the test.group1,group2: the compared groups in the pairwise tests.statistic: Test statistic used to compute the p-value.df: degrees of freedom.p: p-value.p.adj: the adjusted p-value.method: the statistical test used to compare groups.p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.alternative: a character string describing the alternative hypothesis.conf.low,conf.high: Lower and upper bound on a confidence interval.
Effect size
The effect size is calculated as Cohen’s D
dat.clean %>% cohens_d(FORMULA,
var.equal = EQUAL.VAR,
paired = FALSE)## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_liver_pct PFOS VAN+PFOS -0.595 12 12 moderate
5.4.3.4 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% of total dosed PFOS", limits = c(25,45),breaks = seq(25,45,5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(45))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.PFOS serum day 8 in pct. of total
5.5 Total PFOS detected on day 8
This section will prepare to perform the data analysis for total PFOS on day 8.
5.5.1 Analysis and Barplot
5.5.1.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Create new dataframe with pfos data
dat.sub <- subset(dat, pfos == "yes")
tmp <- dat.sub[ , c("rat_name","ordering","treatment","tot_pfos8","pfos_serum8_mg","pfos_liver_mg")]
# Calculate ratios
for (i in tmp$rat_name) {
tmp$total <- tmp$pfos_liver_mg + tmp$pfos_serum8_mg
}
for (i in tmp$rat_name) {
tmp$leftover <- tmp$tot_pfos8 - tmp$total
}
# Calculate percentage of detected PFOS in rats
for (i in tmp$rat_name) {
tmp$pct_det <- tmp$total / tmp$tot_pfos8 * 100
}
print("Group: PFOS")## [1] "Group: PFOS"
tmp %>% subset(treatment == "PFOS") %>% select(pct_det) %>% summary()## pct_det
## Min. :37.73
## 1st Qu.:42.16
## Median :44.60
## Mean :46.83
## 3rd Qu.:51.56
## Max. :57.59
print("Group: VAN+PFOS")## [1] "Group: VAN+PFOS"
tmp %>% subset(treatment == "VAN+PFOS") %>% select(pct_det) %>% summary()## pct_det
## Min. :41.29
## 1st Qu.:45.41
## Median :46.69
## Mean :47.88
## 3rd Qu.:49.49
## Max. :60.94
# Analysis of significance between treatment groups
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pct_det"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- tmp
# Will you run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}
# identify outliers
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 1 × 11
## treatment rat_name ordering tot_pfos8 pfos_serum8_mg pfos_liver_mg total
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 VAN+PFOS R47 47 5.62 1.09 2.34 3.42
## # ℹ 4 more variables: leftover <dbl>, pct_det <dbl>, is.outlier <lgl>,
## # is.extreme <lgl>
# Run Shapiro test
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
shapiro_test(!!sym(OUTCOME))## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pct_det 0.945 0.570
## 2 VAN+PFOS pct_det 0.879 0.0853
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)# Run test
dat.clean %>% levene_test(FORMULA)## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 0.946 0.341
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05Data contains one not extreme outlier, is normally distributed, and has equal variance. Therefore we perform unpaired two-tailed t-test.
5.5.1.2 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -1.05 46.8 47.9 pct_det PFOS VAN+P… 12 12 -0.447 0.659
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Result of t-test show that there is no significant difference between the total percentage of detected PFOS (where 100% is the total dose) between the two groups.
5.5.1.3 Create barplots
# Prepare data columns for barplot
tmp2 <- rbind(data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$leftover, "type" = "Unaccounted"),
data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$pfos_serum8_mg, "type" = "PFOS serum"),
data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$pfos_liver_mg, "type" = "PFOS liver"))
# Create plot per rat
p <- ggplot(tmp2, aes(x = rat_name, y = mg, fill = fct_rev(type))) +
geom_bar(position = "fill", stat = "identity") +
theme_pubr(legend = "top") +
facet_grid(~ treatment, scales = "free_x") +
labs(fill = "Sample type", x = "Rats", y = "% of total dosed") +
scale_fill_manual(values = c("Unaccounted"= "#ffffff", "PFOS liver" = "#FEBE00", "PFOS serum" = "#A91B0D")) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
scale_y_continuous(labels = function(x) paste0(x*100, "%"))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/total_rat_barplot.png"), p, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/total_rat_barplot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 70, height = 100)
# Create plot for average per treatment group
p <- ggplot(tmp2, aes(x = treatment, y = mg, fill = fct_rev(type))) +
geom_bar(position = "fill", stat = "identity") +
theme_pubr(legend = "top") +
labs(fill = "Sample type", x = "Treatment",y = "% of total dosed") +
scale_fill_manual(values = c("Unaccounted"= "#ffffff", "PFOS liver" = "#FEBE00", "PFOS serum" = "#A91B0D")) +
theme(axis.ticks.x=element_blank()) +
scale_y_continuous(labels = function(x) paste0(x*100, "%"))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/total_mean_barplot.png"), p, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/total_mean_barplot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.6 SHORT CHAIN FATTY ACID DATA
Following section is handling data analysis of short chain fatty acids from colonic samples collected at day 8. Ten SCFAs are analysed by MS Omics A/S (Denmark), delivered as concentrations in millimolar (mM), and tested accordingly here.
6.1 Formic acid / Formate
6.1.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "formic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$formic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL formic 12 0.069 0.239
## 2 PFOS formic 12 0.244 0.484
## 3 VAN formic 12 0.12 0.283
## 4 VAN+PFOS formic 12 0.298 0.466
6.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp6.1.3 Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 6 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R08 8 no no 256. 265. 268. 274. 278. 283
## 2 PFOS R28 16 yes no 242. 248. 252. 265. 268. 273
## 3 PFOS R31 19 yes no 283. 284. 293. 301. 307. 315
## 4 PFOS R32 20 yes no 255. 262. 269. 276. 281 291
## 5 VAN R14 26 no yes 246. 256. 260. 267. 270. 274
## 6 VAN R17 29 no yes 268. 278. 274. 290. 295 295
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.732 0.0000000500
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.922 0.438
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
Formic acid data contains vary little data above Limit of Detection (= 0.6) with a total of 10 valid data points. There are no outliers, Shapiro-Wilk test shows no normality and Levene test shows equal variance. We use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment. ### Kruskal-Wallis test
6.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 formic 48 2.44 3 0.486 Kruskal-Wallis
6.1.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 formic 48 -0.0127 eta2[H] small
6.1.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 formic CTRL PFOS 12 12 0.986 0.324 0.648 ns
## 2 formic CTRL VAN 12 12 0.400 0.689 0.689 ns
## 3 formic CTRL VAN+PFOS 12 12 1.45 0.148 0.648 ns
## 4 formic PFOS VAN 12 12 -0.585 0.558 0.689 ns
## 5 formic PFOS VAN+PFOS 12 12 0.462 0.644 0.689 ns
## 6 formic VAN VAN+PFOS 12 12 1.05 0.295 0.648 ns
6.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM formate",limits = c(0,1.6),breaks = seq(0,1.6,0.5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.6, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p## Warning: Removed 22 rows containing missing values (`geom_point()`).
p.formic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.formic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 22 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 22 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.SCFA in mM
6.2 Acetic acid / Acetate
6.2.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "acetic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(OUTCOME) & !dat$rat_name == "R08")
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL acetic 11 7.84 1.23
## 2 PFOS acetic 12 7.19 2.49
## 3 VAN acetic 12 3.02 0.674
## 4 VAN+PFOS acetic 12 3.29 1.04
6.2.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp6.2.3 Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 1 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R10 10 no no 266. 273. 275. 285. 291. 294
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers, where one is extreme (R08). This outlier has been removed from the analysis.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.966 0.189
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 43 8.35 0.000174
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
Most data for this set is above Limit of Detection (= 0.5). This shows that SCFA concentration has two outliers, where one is extreme and has been removed from analysis. Shapiro-Wilk test show normality but the data has unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
6.2.4 Kruskal-Wallis test
6.2.4.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 acetic 47 31.4 3 0.000000692 Kruskal-Wallis
6.2.4.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 acetic 47 0.661 eta2[H] large
6.2.4.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 acetic CTRL PFOS 11 12 -0.478 0.633 0.743 ns
## 2 acetic CTRL VAN 11 12 -4.31 0.0000165 0.0000992 ****
## 3 acetic CTRL VAN+PFOS 11 12 -3.99 0.0000670 0.000181 ***
## 4 acetic PFOS VAN 12 12 -3.92 0.0000903 0.000181 ***
## 5 acetic PFOS VAN+PFOS 12 12 -3.59 0.000333 0.000500 ***
## 6 acetic VAN VAN+PFOS 12 12 0.328 0.743 0.743 ns
6.2.4.1 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM acetate",limits = c(0,15),breaks = seq(0,15,2)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(14,15,12,13))
pp.acetic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.acetic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.PFOS amount in mg at day 4 and 8
6.3 Propanoic acid / Propionate
6.3.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "propanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$propanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL propanoic 12 0.268 0.113
## 2 PFOS propanoic 12 0.265 0.095
## 3 VAN propanoic 12 0.264 0.086
## 4 VAN+PFOS propanoic 12 0.325 0.102
6.3.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 6 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R08 8 no no 256. 265. 268. 274. 278. 283
## 2 PFOS R28 16 yes no 242. 248. 252. 265. 268. 273
## 3 VAN R15 27 no yes 268. 277. 283. 290. 296. 300
## 4 VAN R18 30 no yes 266. 275. 282. 285. 288. 298
## 5 VAN R22 34 no yes 292. 296. 301. 313. 311. 321
## 6 VAN+PFOS R43 43 yes yes 292. 301. 300. 313. 316. 322
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains five outliers, where none are extreme. As removing outliers does not affect final outcome they are left in the analysis.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.980 0.586
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.280 0.840
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that propanoic acid concentration has 5 outliers, Shapiro-Wilk test shows normality and Levene test shows equal variance. Therefore we will use one-way ANOVA test with Tukey’s honest significance test. ### ANOVA One-Way test
6.3.2.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(FORMULA)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 treatment 3 44 1.068 0.372 0.068
6.3.2.2 Perform posthoc test
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 6 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 treat… CTRL PFOS 0 -0.00295 -0.111 0.105 1 ns
## 2 treat… CTRL VAN 0 -0.00466 -0.113 0.104 0.999 ns
## 3 treat… CTRL VAN+P… 0 0.0566 -0.0516 0.165 0.508 ns
## 4 treat… PFOS VAN 0 -0.00171 -0.110 0.107 1 ns
## 5 treat… PFOS VAN+P… 0 0.0596 -0.0487 0.168 0.464 ns
## 6 treat… VAN VAN+P… 0 0.0613 -0.0470 0.170 0.44 ns
6.3.3 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM propionate",limits = c(0,0.6),breaks = seq(0,0.6,0.2)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p## Warning: Removed 1 rows containing missing values (`geom_point()`).
p.propanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.propanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.SCFA in mM
6.4 2-methyl-Propanoic acid / Isobutyrate
6.4.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m2_propanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m2_propanoic)) # & !dat$rat_name == "R01")
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL m2_propanoic 12 0.049 0.025
## 2 PFOS m2_propanoic 12 0.05 0.017
## 3 VAN m2_propanoic 12 0.005 0.018
## 4 VAN+PFOS m2_propanoic 12 0.01 0.027
6.4.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 5 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R01 1 no no 310 321. 325. 339. 350 354
## 2 CTRL R08 8 no no 256. 265. 268. 274. 278. 283
## 3 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 4 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## 5 VAN+PFOS R47 47 yes yes 242. 249. 255. 263. 267. 271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains one outliers, where one is extreme (R01). This outlier has been removed from the analysis -> leading to new outlier but not extreme.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.755 0.000000145
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.633 0.598
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that 2-methyl-propanoic acid concentration has five outliers of which four are extreme outlier - these arise from several samples = 0, and based on this they will be left in. Shapiro-Wilk test show normality but the data has unequal variance. Furthermore, very few samples above Limit of Detection (= 0.02) are observed in vancomycin treated samples. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
6.4.3 Kruskal-Wallis test
6.4.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 m2_propanoic 48 26.2 3 0.00000882 Kruskal-Wallis
6.4.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 m2_propanoic 48 0.526 eta2[H] large
6.4.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 m2_propanoic CTRL PFOS 12 12 0.0383 9.69e-1 9.69e-1 ns
## 2 m2_propanoic CTRL VAN 12 12 -3.73 1.94e-4 5.82e-4 ***
## 3 m2_propanoic CTRL VAN+PF… 12 12 -3.46 5.44e-4 8.15e-4 ***
## 4 m2_propanoic PFOS VAN 12 12 -3.76 1.67e-4 5.82e-4 ***
## 5 m2_propanoic PFOS VAN+PF… 12 12 -3.50 4.71e-4 8.15e-4 ***
## 6 m2_propanoic VAN VAN+PF… 12 12 0.268 7.88e-1 9.46e-1 ns
6.4.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM isobutyrate",limits = c(0,0.11),breaks = seq(0,0.11,0.02)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.02, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.102,0.11,0.096,0.088))
p## Warning: Removed 13 rows containing missing values (`geom_point()`).
p.m2p <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m2p,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 13 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 13 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.SCFA in mM
6.5 Butanoic acid / Butyrate
6.5.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "butanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$butanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL butanoic 12 1.28 0.655
## 2 PFOS butanoic 12 1.14 0.625
## 3 VAN butanoic 12 0.187 0.195
## 4 VAN+PFOS butanoic 12 0.193 0.109
6.5.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN R20 32 no yes 285 293. 301. 310. 317. 321
## 2 VAN+PFOS R48 48 yes yes 224. 229. 234. 239. 242. 250
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers, where one is extreme (R20). This outlier does not affect the final results or type of analysis and has been left in.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.938 0.0130
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 7.69 0.000309
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
Most data for this set is above Limit of Detection (= 0.03). This shows that butanioc acid concentration has two outliers, where one is extreme and has been removed from analysis. Shapiro-Wilk test shows no normality and Levene test shows unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
6.5.3 Kruskal-Wallis test
6.5.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 butanoic 48 27.4 3 0.0000048 Kruskal-Wallis
6.5.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 butanoic 48 0.555 eta2[H] large
6.5.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 butanoic CTRL PFOS 12 12 -0.0729 0.942 0.942 ns
## 2 butanoic CTRL VAN 12 12 -3.95 0.0000777 0.000315 ***
## 3 butanoic CTRL VAN+PFOS 12 12 -3.50 0.000467 0.000918 ***
## 4 butanoic PFOS VAN 12 12 -3.88 0.000105 0.000315 ***
## 5 butanoic PFOS VAN+PFOS 12 12 -3.43 0.000612 0.000918 ***
## 6 butanoic VAN VAN+PFOS 12 12 0.452 0.651 0.782 ns
6.5.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM butyrate",limits = c(0,3),breaks = seq(0,3,0.5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(2.6,3,2.4,2.8))
p## Warning: Removed 1 rows containing missing values (`geom_point()`).
p.butanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.butanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.SCFA in mM
6.6 3-methyl-Butanoic acid / Isovalerate
6.6.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m3_butanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m3_butanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL m3_butanoic 12 0.03 0.017
## 2 PFOS m3_butanoic 12 0.033 0.012
## 3 VAN m3_butanoic 12 0.005 0.018
## 4 VAN+PFOS m3_butanoic 12 0.006 0.022
6.6.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 3 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R01 1 no no 310 321. 325. 339. 350 354
## 2 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 3 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains three outliers arising from several data points being below Limit of detection. Furthermore removing extreme outliers does not affect the result or analysis - these have therefore been left in.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.678 0.00000000551
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.391 0.760
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that 3-methyl-butanoic acid concentration has three outliers, which has been left in. Shapiro-Wilk test show no normality but the data has equal variance. Furthermore, very few samples above Limit of Detection (= 0.02) are observed in vancomycin treated samples. We use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
6.6.3 Kruskal-Wallis test
6.6.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 m3_butanoic 48 25.6 3 0.0000117 Kruskal-Wallis
6.6.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 m3_butanoic 48 0.513 eta2[H] large
6.6.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 m3_butanoic CTRL PFOS 12 12 0.556 5.78e-1 6.94e-1 ns
## 2 m3_butanoic CTRL VAN 12 12 -3.29 9.96e-4 1.67e-3 **
## 3 m3_butanoic CTRL VAN+PFOS 12 12 -3.26 1.11e-3 1.67e-3 **
## 4 m3_butanoic PFOS VAN 12 12 -3.85 1.19e-4 4.05e-4 ***
## 5 m3_butanoic PFOS VAN+PFOS 12 12 -3.82 1.35e-4 4.05e-4 ***
## 6 m3_butanoic VAN VAN+PFOS 12 12 0.0309 9.75e-1 9.75e-1 ns
6.6.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM isovalerate",limits = c(0,0.1),breaks = seq(0,0.1,0.02)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.02, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.092,0.1,0.084,0.076))
p## Warning: Removed 14 rows containing missing values (`geom_point()`).
p.m3b <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m3b,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.SCFA in mM
6.7 Pentanoic acid / Valerate
6.7.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pentanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$pentanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL pentanoic 12 0.047 0.023
## 2 PFOS pentanoic 12 0.048 0.025
## 3 VAN pentanoic 12 0.004 0.015
## 4 VAN+PFOS pentanoic 12 0.007 0.026
6.7.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 3 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS R26 14 yes no 238. 246. 248 257 259. 265
## 2 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 3 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains three outliers arising from several data points being below Limit of detection. Furthermore removing extreme outliers does not affect the result or analysis - these have therefore been left in.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.820 0.00000372
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 1.72 0.177
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that pentanoic acid concentration has one outlier, which has been left in for the analysis. Shapiro-Wilk test shows no normality and Levene test shows equal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
6.7.3 Kruskal-Wallis test
6.7.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 pentanoic 48 27.3 3 0.00000509 Kruskal-Wallis
6.7.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 pentanoic 48 0.552 eta2[H] large
6.7.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 pentanoic CTRL PFOS 12 12 0.0155 0.988 0.988 ns
## 2 pentanoic CTRL VAN 12 12 -3.76 0.000173 0.000448 ***
## 3 pentanoic CTRL VAN+PFOS 12 12 -3.62 0.000299 0.000448 ***
## 4 pentanoic PFOS VAN 12 12 -3.77 0.000163 0.000448 ***
## 5 pentanoic PFOS VAN+PFOS 12 12 -3.63 0.000282 0.000448 ***
## 6 pentanoic VAN VAN+PFOS 12 12 0.139 0.889 0.988 ns
6.7.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM valerate",limits = c(0,0.11),breaks = seq(0,0.11,0.02)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.086,0.094,0.11,0.102))
p## Warning: Removed 14 rows containing missing values (`geom_point()`).
p.pentanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.pentanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.SCFA in mM
6.8 4-methyl-Pentanoic acid / Isocaproate
6.8.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m4_pentanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m4_pentanoic))# & !dat$rat_name %in% c("R42","R45"))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL m4_pentanoic 12 0.018 0.023
## 2 PFOS m4_pentanoic 12 0.03 0.058
## 3 VAN m4_pentanoic 12 0.035 0.074
## 4 VAN+PFOS m4_pentanoic 12 0.216 0.513
6.8.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 5 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS R25 13 yes no 339. 340. 353. 364. 348. 358
## 2 PFOS R28 16 yes no 242. 248. 252. 265. 268. 273
## 3 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 4 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## 5 VAN+PFOS R45 45 yes yes 234. 239. 244. 253. 262. 263
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains five outliers, however, these outliers arise several data points being under Limit of Detection (=0.03). These data points are therefore kept.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.461 4.94e-12
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 1.62 0.198
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that 4-methyl-pentanoic acid concentration has five outliers and only few data points are above Limit of Detection (= 0.03) - outliers are therefore kept. Shapiro-Wilk test show no normality but the data has equal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
6.8.3 Kruskal-Wallis test
6.8.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 m4_pentanoic 48 1.32 3 0.724 Kruskal-Wallis
6.8.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 m4_pentanoic 48 -0.0381 eta2[H] small
6.8.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 m4_pentanoic CTRL PFOS 12 12 -0.165 0.869 0.987 ns
## 2 m4_pentanoic CTRL VAN 12 12 0.0165 0.987 0.987 ns
## 3 m4_pentanoic CTRL VAN+PFOS 12 12 0.875 0.381 0.781 ns
## 4 m4_pentanoic PFOS VAN 12 12 0.182 0.856 0.987 ns
## 5 m4_pentanoic PFOS VAN+PFOS 12 12 1.04 0.298 0.781 ns
## 6 m4_pentanoic VAN VAN+PFOS 12 12 0.859 0.391 0.781 ns
6.8.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM isocaproate",limits = c(0,1.78),breaks = seq(0,1.78,0.2)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p## Warning: Removed 14 rows containing missing values (`geom_point()`).
p.m4p <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m4p,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.SCFA in mM
6.9 Hexanoic acid / Caproate
6.9.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "hexanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$hexanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL hexanoic 12 0.053 0.041
## 2 PFOS hexanoic 12 0.07 0.058
## 3 VAN hexanoic 12 0.004 0.014
## 4 VAN+PFOS hexanoic 12 0.006 0.021
6.9.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 2 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers, both arising due to majority of data points in the vancomycin treated groups are below Limit of Detection (=0.01). Therefore these outliers are left in.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.874 0.0000992
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 6.86 0.000687
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that hexanoic acid concentration has two outliers and several datapoint below Limit of Detection. Shapiro-Wilk test shows no normality and Levene test shows unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
6.9.3 Kruskal-Wallis test
6.9.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 hexanoic 48 28.0 3 0.00000356 Kruskal-Wallis
6.9.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
dat.clean %>% kruskal_effsize(FORMULA)## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 hexanoic 48 0.569 eta2[H] large
6.9.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
# pairwise comparisons
pwc <- dat.clean %>%
dunn_test(FORMULA, p.adjust.method = "fdr")
pwc## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 hexanoic CTRL PFOS 12 12 0.717 0.473 0.568 ns
## 2 hexanoic CTRL VAN 12 12 -3.39 0.000699 0.00139 **
## 3 hexanoic CTRL VAN+PFOS 12 12 -3.31 0.000927 0.00139 **
## 4 hexanoic PFOS VAN 12 12 -4.11 0.0000401 0.000168 ***
## 5 hexanoic PFOS VAN+PFOS 12 12 -4.03 0.0000560 0.000168 ***
## 6 hexanoic VAN VAN+PFOS 12 12 0.0779 0.938 0.938 ns
6.9.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM caproate",limits = c(0,0.22),breaks = seq(0,0.22,0.05)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.205,0.22,0.175,0.19))
p## Warning: Removed 15 rows containing missing values (`geom_point()`).
p.hexanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.hexanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 15 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 15 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.SCFA in mM
6.10 Heptanoic acid / Enanthate
6.10.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "heptanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$heptanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL heptanoic 12 0.004 0.012
## 2 PFOS heptanoic 12 0 0
## 3 VAN heptanoic 12 0.009 0.033
## 4 VAN+PFOS heptanoic 12 0.008 0.028
6.10.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 3 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R10 10 no no 266. 273. 275. 285. 291. 294
## 2 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 3 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.399 9.66e-13
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
plot(model, 1)- It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.446 0.722
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that heptanoic acid concentration only three data points above Limit of Detection (= 0.03) which is deemed too low for analysis. No final analysis therefore made.
6.10.3 Create figure
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM enanthate",limits = c(0,0.22),breaks = seq(0,0.22,0.05)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
#p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.205,0.22,0.175,0.19))
p## Warning: Removed 25 rows containing missing values (`geom_point()`).
p.heptanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.heptanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 25 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 25 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.SCFA in mM
6.11 SCFA ggarrange
Here all plots from the SCFA analysis is combined into one figure.
params <- readRDS("R_objects/animal_params.RDS")
# Load rdata files with scfa plots
pfiles <- list.files(path = "R_objects/scfa/", pattern = "*.rdata", full.names = TRUE)
lapply(pfiles, load,.GlobalEnv)## [[1]]
## [1] "p.acetic"
##
## [[2]]
## [1] "p.butanoic"
##
## [[3]]
## [1] "p.formic"
##
## [[4]]
## [1] "p.heptanoic"
##
## [[5]]
## [1] "p.hexanoic"
##
## [[6]]
## [1] "p.m2p"
##
## [[7]]
## [1] "p.m3b"
##
## [[8]]
## [1] "p.m4p"
##
## [[9]]
## [1] "p.pentanoic"
##
## [[10]]
## [1] "p.propanoic"
# Create plot
p.all <- ggarrange(p.formic,p.acetic,p.propanoic,p.m2p,p.butanoic,p.m3b,p.pentanoic,p.m4p,p.hexanoic,p.heptanoic,
ncol = 5, nrow = 2,
common.legend = TRUE,
legend = "top",
label.x = 0,
font.label = list(size = 24, face = "bold"),
labels = c("A","B","C","D","E","F","G","H","I","J"),
align = "hv")## Warning: Removed 22 rows containing missing values (`geom_point()`).
## Removed 22 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 13 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 14 rows containing missing values (`geom_point()`).
## Removed 14 rows containing missing values (`geom_point()`).
## Removed 14 rows containing missing values (`geom_point()`).
## Warning: Removed 15 rows containing missing values (`geom_point()`).
## Warning: Removed 25 rows containing missing values (`geom_point()`).
p.all# Save graphics
ggsave(filename = "plots/animal_data/scfa/all.png", p.all, device = "png", dpi = 300, height = 200, width = 400, units = "mm")
ggsave(filename = "plots/animal_data/scfa/all.pdf", p.all, device = "pdf", dpi = 300, height = 200, width = 400, units = "mm")
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7 SETTINGS
Overview of the parameters and packages that were used for this analysis
7.1 PARAMETERS
The following paramenters were set in for this analysis:
params <- readRDS("R_objects/import_params.RDS")
tmp <- unlist(params)
dat <- data.frame(Parameter = names(tmp), Value = unname(tmp))
kbl(dat, row.names = F) %>% kable_classic(lightable_options = "striped")7.2 PACKAGES
The analysis was run in R version 4.2.2 using the following packages:
pack <- data.frame(Package = (.packages()))
for (i in seq(nrow(pack))) pack$Version[i] <- as.character(packageVersion(pack$Package[i]))
kbl(pack[order(pack$Package),], row.names = F) %>% kable_classic(lightable_options = "striped") | Package | Version |
|---|---|
| ape | 5.7 |
| base | 4.2.2 |
| datasets | 4.2.2 |
| decontam | 1.18.0 |
| dplyr | 1.1.0 |
| forcats | 1.0.0 |
| ggbreak | 0.1.1 |
| ggplot2 | 3.4.2 |
| ggpubr | 0.6.0 |
| graphics | 4.2.2 |
| grDevices | 4.2.2 |
| kableExtra | 1.3.4 |
| lattice | 0.20.45 |
| lubridate | 1.9.2 |
| methods | 4.2.2 |
| pals | 1.7 |
| permute | 0.9.7 |
| phangorn | 2.11.1 |
| phyloseq | 1.42.0 |
| plotly | 4.10.1 |
| purrr | 1.0.1 |
| readr | 2.1.4 |
| rstatix | 0.7.2 |
| stats | 4.2.2 |
| stringr | 1.5.0 |
| tibble | 3.1.8 |
| tidyr | 1.3.0 |
| tidyverse | 2.0.0 |
| utils | 4.2.2 |
| vegan | 2.6.4 |